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Abstract 

In this position paper, we consider the state of computer 
vision research with respect to invariance to the horizontal 
orientation of an image - what we term reflection invari¬ 
ance. We describe why we consider reflection invariance to 
be an important property and provide evidence where the 
absence of this invariance produces surprising inconsisten¬ 
cies in state-of-the-art systems. We demonstrate inconsis¬ 
tencies in methods of object detection and scene classifi¬ 
cation when they are presented with images and the hori¬ 
zontal mirror of those images. Finally, we examine where 
some of the invariance is exhibited in feature detection 
and descriptors, and make a case for future consideration 
of reflection invariance as a measure of quality in computer 
vision algorithms. 


1 Introduction 

Human perception is invariant to horizontal reflection; 
they are equally able to recognise objects and scenes re¬ 
gardless of whether they are looking at an image as it has 
been taken as a photograph, or a horizontally reflected im¬ 
age, as if looking in a mirror. We observe that computer 
vision algorithms are more sensitive to the reflection of an 
image and that invariance to this has not received any at¬ 
tention in contemporary research. In this position paper, 
we introduce a property of reflection invariance^ specifi¬ 
cally studying horizontal reflection as an introduction to 
the concept, although discussion is appropriate for general 
reflection about alternatives lines of symmetry. 

We suggest reflection invariance is an important prop¬ 
erty that should be considered in designing and imple¬ 
menting algorithms, and used as a metric in measuring 
the success of vision algorithms and applications. Just as 
scale invariance seeks to neutralise the size of a feature 
to avoid bias in scale, we propose reflection invariance to 
avoid bias in mirror reflection about an arbitrary axis. It 
is important that algorithms should be consistent in appli¬ 
cations such as object recognition and scene classification, 
and we demonstrate that current state-of-the-art meth¬ 
ods do not exhibit consistency when an image is reflected 
horizontally. 

2 Orientation and Reflection 

Low-level keypoint features describe a neighborhood of a 
few pixels, where the co-location of pixel intensities is an 
important attribute used to describe the feature. Most 
feature descriptors, including the most popular SIFT [1] 
and HoG [2], use the orientation of pixel gradients in a 
color space or channel in some way to detect and rep¬ 
resent distinct feature characteristics. These algorithms 
are inherently sensitive to orientation, however others are 
sensitive only in practice, caused by poor implementation 
choices and mathematical rounding errors that accumu¬ 
late to affect the result and cause dependence on image 
orientation. 

A collection of descriptors can be composed to describe 
a distinctive pattern or region, such as in the popular Bag 
of Visual Words method [3]. In such a collection, the 
orientation of individual features relative to each other is 
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Figure 1: Pyramid of Scales and Orientation Significance: 
as the scale increases, the importance of orientation di¬ 
minishes 


important, but the orientation of the collection as a whole 
is less significant. As the scale of description increases 
further, orientation becomes less important and indeed be¬ 
comes a limitation when considering high-level features in 
an image. The significance of orientation can therefore be 
considered inversely proportional to the scale of descrip¬ 
tion, with its influence diminishing with the increase in 
distance from the pixel detail (Figure [^. 

Reflection has the same scale of sensitivity as rotational- 
orientation. Consider an example of scene recognition. A 
human would describe a city-scape scene, and identify a 
familiar city regardless of the horizontal reflection of the 
image; if the image is reflected about its vertical centre, 
this mirrored image would still be recognisable to a human 
and would not influence their description or identification. 
Computer vision algorithms, however, are more sensitive 
and often produce different results for these images. 

The challenge is to generalize the description as the scale 
increases, with orientation becoming less relevant to the 
point where it is irrelevant at image scale. 


Detector 

Invariant 

BRISK 

No 

FAST 

Perfect 

GFTT 

Yes, after matching 

HARRIS 

Yes, after matching 

ORB 

No 

SIFT 

No 

STAR 

Perfect 

SURF 

No 

MSCR 

Somewhat 

MSER 

Somewhat 


Table 1: Conclusions of the invariance characteristics of 
ten feature detectors from [4] 


sess the two separately and propose that it is not necessary 
- or even desirable - for a method to be consistent in a re¬ 
flection invariance in detection and description. The goal 
of feature detection is to find keypoints or regions in an 
image that contain interesting information. The definition 
of interesting is specific to the goal of the detector, but it 
is reasonable to expect that a location that is interesting 
in an image should also be interesting in the same image 
that is horizontally reflected. 


Feature detectors To be reflection invariant, a feature 
detector must show that the set of keypoints or regions 
found in an image are equivalent to those found in the 
a mirror reflection of the image [4]. In that study, an 
analysis of feature detectors with respect to reflection in¬ 
variance concluded that corner detectors are stable, and 
the most popular detectors SIFT and SURF are very un¬ 
stable in detecting consistent feature points in images and 
their mirror reflections (Table [^. 


3 Reflection sensitivity in state-of- 
the-art methods 

3.1 Low level features 

Feature detectors fulfil the common need to identify inter¬ 
est points within an image. Information at these positions 
is extracted into a deseriptor - a fixed length vector of 
numeric or binary values - that can used, for example, 
to match similar features in applications such as image 
retrieval, alignment, stitching, and classification. 

Many research papers combine the two stages of de¬ 
tection and description into a single step, but each are 
independent. The invariance properties of detectors and 
descriptors are important, and in work to date are con¬ 
sistent. An algorithm that provides for feature detection 
and feature description can provide invariance to scale, 
rotation, illumination or affine regions in both steps. 

In considering invariance to horizontal reflection, we as- 


Feature descriptors Conversely, the orientation of a 
feature is an important and discriminating attribute, and 
extracted descriptors should generally maintain local ori¬ 
entation so that established methods of feature match¬ 
ing, for example, can accurately measure the magnitude 
and position of a feature vector in high-dimensional space. 
However, reflection invariance in low-level descriptors can 
be especially useful for detecting intra-image lines of sym¬ 
metry, such as water reflections in scene analysis. Research 
has explored reflection-invariant HoG [5] and, more fre¬ 
quently, SIFT-based methods such as RIFT [6], MI-SIFT 
[7] and MIFT [8]. Generally, rotational invariance can 
be achieved by finding the dominant gradient and rotat¬ 
ing the image patch so that the gradient is always in the 
same direction. RIFT, for example, divides normalized 
patches into four concentric rings of equal width, from 
each of which eight gradient orientation histograms are 
computed. The orientation is measured at each point rel¬ 
ative to the direction pointing outward from the center, 
thus maintaining rotation invariance. 
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3.2 Alignment and localization 


In a recent work, [9] assessed object part localization and 
observed that the state-of-the-art methods augment the 
training set with mirrored images, but they did not result 
in bilaterally symmetric results. The authors introduced 
the term mirrorability and a mirror error that correlated 
with localization errors in human pose estimation and face 
alignment. 


3.3 Hough Forests 

Hough forests m use a random forest framework m that 
are trained to learn a mapping from densely-sampled D- 
dimensional feature cuboids to their corresponding votes 
in a Hough space H C The Hough space encodes the 
hypothesis h(c, x, s) for an object belonging to class c G C 
centred on x G and with size s. The term cuboid refers 
to a local image patch {D = 2) or video spatio-temporal 
neighborhood (D = 3) depending on the task. 

Since their introduction in 2009 m, Hough Forests 
have gained some interest in object detection tasks naE]. 
Features are extracted from feature channels derived from 
an image, and are used to cast votes in Hough space. 
The standard set of 32 feature channels include His¬ 
tograms of Oriented Gradients-\\k.e features with 9 bins 
using weighted orientations from a 5 x 5 neighborhood. 
The detected salient areas are therefore inherently sensi¬ 
tive to in-plane orientation and reflection. 

3.4 Deep learning 

While the recent adoption and development of neural net¬ 
work techniques have undoubtedly produced impressive 
results in computer vision tasks, and object and scene 
recognition in particular, they are not at all robust to vari¬ 
ation in data. Studies have shown that changing an image 
in a way imperceptible to humans can cause a deep neu¬ 
ral network (DNN) to label the image as something else 
entirely m and that it is easy to produce images that 
are completely unrecognizable to humans, but that state- 
of-the-art DNNs believe to be recognizable objects with 
99.99% confidence [17]. 

Recently, m published research on a scene recogni¬ 
tion system with an online demonstratiorQ Figure 
shows a set of four images and their mirror reflections 
(top row) with the information regions that the author’s 
online demo produce. The information regions are salient 
areas that the system has identified in its quest to un¬ 
derstand and describe an image. We find compelling the 
difference in the information regions and suggest that this 
demonstrates a bias to the horizontal orientation of the 
image. 

Table shows the detailed results of the scene recog¬ 
nition. The system determines the environment, seman¬ 
tic categories and SUN scene attributes [18]. The cat¬ 
egory column summarizes the highest scoring semantic 
category. Despite the differences in salient areas of the 
images, the overall categorization has not been affected. 
Each image and its mirror image are categorized the same 


Table 3: Object recognition results from the online Wol¬ 
fram Language Image Identity Project 


Resolution 


550 X 412 


244 X 183 


Original 



broken arch 


Mirror 



arch 


736 X 490 


275 X 183 


607 X 338 


329 X 183 



building 



church 



fire truck 



building 




church 


church 



^ http://places.csail.mit.edu/demo.html 
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Table 2: Predictions from [15] deep learning Scene Recognition system 


Environment 


Semantic 

categories 


SUN scene attributes Category 



outdoor 


outdoor 


rock_arcli:0.75, 

arcli:0.24 


rock_arcli:0.74, 

arcli:0.25 


naturallight, openarea, ruggedscene, 
climbing, rockstone, directsunsunny, dry, rock.arch 

vacationingtouring, natural, warm 

naturallight, ruggedscene, rockstone, 
openarea, climbing, directsunsunny, dry, rock_arch 

vacationingtouring, warm, natural 



outdoor 


outdoor 



outdoor 


outdoor 



outdoor 


outdoor 


tower: 0.50, 
bridge:0.25, 
viaduct: 0.12 

tower: 0.50, 
bridge:0.25, 
viaduct: 0.12 


skyscraper: 0.72, 
tower: 0.13, 
office_building: 
0.06 

sky scraper: 0.66, 
tower:0.13, 
office _building: 0.11 


abbey:0.64, 

palace:0.16 


abbey:0.66, 
palace: 0.15 


man-made, clouds, openarea, naturallight, 

mostlyverticalcomponents, metal, tower 

vacationingtouring, nohorizon, 

directsunsunny, congregating 

man-made, clouds, openarea, naturallight, 

mostly vert icalcomponents, metal, 

^ . , . . tower 

vacationingtouring, nohorizon, praying, 

directsunsunny 

mostly vert icalcomponents, openarea, 
man-made, naturallight, directsunsunny, 
far-away horizon, clouds, metal, driving, 
transport ingthingsorpeople 
mostly vert icalcomponents, openarea, 
man-made, naturallight, directsunsunny, 
driving, transportingthingsorpeople, 
clouds, far-away horizon, metal 

man-made, clouds, openarea, 
mostlyverticalcomponents, naturallight, , , 

vacationingtouring, praying, nohorizon, 
electricindoorlighting, metal 
clouds, man-made, openarea, 
mostlyverticalcomponents, naturallight, , , 

praying, vacationingtouring, nohorizon, 
metal, electricindoorlighting 


skyscraper 


skyscraper 
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Figure 2: Informative regions of images and their mirror, identified by m- Note that the informative regions are not 
mirror images, suggesting the algorithms are sensitive to the horizontal orientation of the image. 


in these examples. However, there are differences in the 
detail, which illustrate inconsistencies that, in boundary 
cases, could change the categorization. The semantic cat¬ 
egories are rated with a likelihood. The Rock Arch - a 
stock image from the author’s own demonstration - re¬ 
duces in likelihood by 0.01 in the mirror image, the Palace 
of Westminsteij^is classified exactly the same in each pair. 
Tower Bridge - another stock image from the author’s own 
demonstration - appears less like a skyscraper and more 
like an office building in the reflected image than in the 
original, and the City of London sky lin^ increases its like¬ 
lihood of being an abbey in the reflection image. The in¬ 
consistency in the ratings, albeit small, further strengthen 
our resolve that computer vision systems are commonly 
bias to image horizontal orientation. It is also interesting 
to note that images from the author’s own demonstration 
score higher in the semantic categorization than images 
from other sources. 

We used a second neural network based object recogni¬ 
tion system. The Wolfram Language Image Identifieation 
Projeei^ to test classification of our images, this time us¬ 
ing different sizes of the same image. Table shows the 
results; the Rock Arch is classified differently in its origi¬ 
nal orientation at a small scale, the Palace of Westminster 
was classified consistently at each scale. Tower Bridge is 
classified differently in its original orientation at a large 
scale and the London Skyline is classified differently in its 
mirror orientation at a large scale. These results show that 
this system is sensitive to scale, and that the scale change 
also influences the invariance to horizontal reflection. 

^ https://s-media-cache-akO.pinimg.com/736x/d2/5a/Oe/ 
d25a0ed9bb2e788ae9c9ec59cc52670c.jpg 

http: / / upload. wikimedia. org/wikipedia/ commons/d/da/The_ 
City_London.j pg 

https: //WWW. imageidentify. com/ 


Finally, Microsoft’s much publicised How-Old. nef[^ asks 
^^How Old Do I Look?^^ and uses machine learning to 
guess the answer to the question from a photograph. We 
used photographs of Alan Turing and Prince Charle^ 
and observed the difference in age that was guessed for 
each image and its reflection (Figure]^. In both cases, 
the ages decreased in the reflected image {right)^ despite 
the orientation of the head being different in each case. 

This inconsistency in results is perhaps more surprising 
as the image orientation affects the guess of the person’s 
age, but the system does not appear to be intrinsically 
biased towards the orientation of the head itself. On close 
examination, the bounding boxes of the identified faees 
are different sizes - smaller in the reflected image in both 
cases - by 5 pixels in each x- and ^-axis in the case of the 
photograph of Alan Turing and I pixel in each axis in the 
case of Prince Charles. The detected face of Alan Turing is 
in a consistent corner position relative to the visible ear, 
and the detected face of Prince Charles is consistent in 
the opposite top corner. We therefore conclude that the 
face detection algorithm used in the system is sensitive 
to head orientation and this may affect the subsequent 
learned system of age estimation, which may or may not 
be orientation-sensitive itself. 


"http://how-old.net 

^https://kpfa.org/wp-content/uploads/2015/05/ 
Dr-Alan-Turing-2956483.jpg 

‘ http://i.telegraph.co.uk/multimedia/archive/01422/ 
princeCharles_1422434c.jpg 
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Figure 3: Microsoft’s How-Old.Net demonstration at¬ 
tempts to guess a person’s age from a photograph image. 
These two examples demonstrate that the system is sensi¬ 
tive to image orientation - and not head orientation - as 
the ages are quite different for each pair. 

4 Algorithms and Implementa¬ 
tions 

Many algorithms described in the research literature - es¬ 
pecially saliency based feature detectors - are not inher¬ 
ently sensitive to orientation. Nonetheless, no mention is 
made of reflection invariance in the papers, suggesting a 
general unawareness of this property. Consequently, we 
have observed several cases where commonly used, freely 
available code - including reference implementations from 
the original authors - have an invariance worsened by, or 
caused hy^ choices made in the implementation. For exam¬ 
ple, algorithms that use a Difference-of-Gaussian pyramid 
for sub-pixel feature detection can inadvertently increase 
their reflection dependence by the use of 32-bit floating 
point arithmetic for intermediate calculations. Using the 
popular OpenCV m tool kit for C++, we tested the 
GaussianBlur 0 function that convolves an image with 
a specified Gaussian kernel. We found that using 32-bit 
arithmetic produces reflection-sensitive convolutions for 
many images that we tested (not shown), but using 64-bit 
arithmetic all convolutions of our test images were reflec¬ 
tion invariant (Figure]^. 

Conceptually, one would expect salient regions to be less 
biased to horizontal orientation, because they use neigh¬ 
borhood color and intensity measures and are less depen¬ 
dent on pixel gradients. However, common implementa¬ 
tions of salient region detectors such as maximally sta¬ 
ble extremal regions (MSER) [20] can suffer in the initial 
step of the algorithm blurring the image with a Gaussian 
kernel. In their saliency detector reference implementa- 
tiorj^ [21] exhibit orientation sensitivity due to many rea¬ 
sons including floating point errors in color quantization 
which are realized differently dependent on the order in 
which the data is processed, which is determined by the 
image orientation. Increasing floating point arithmetic to 
double-precision 64-bit calculations correct the quantiza¬ 
tion sensitivity to reflection invariance. 

® https://github.com/MingMingCheng/CmCode 


using namespace cv; 

Mat src = imread ("image.png" , 
CV_LOAD_IMAGE_GRAYSCALE); 

Mat fpt; 

src.convertToCfpt, CV_32F, SIFT_FIXPT_SCALE, 0); 
Mat fpt_r; 

flipCfpt, fpt_r, 1); 
auto sigma = 1.24899971; 

GaussianBlur (fpt, fpt, SizeO, sigma, sigma); 
GaussianBlur (fpt _r, fpt_r, SizeO, sigma, sigma) ; 

assert(countNonZero(fpt - fpt_r) == 0); 


Figure 4: Example C++ code to test reflection invariance of 
a Gaussian filter in OpenCV. Using 32-bit floating point 
arithmetic - CV_32F on line 6 - will often result in an 
assertion failure on line 15 indicating that a Gaussian filter 
on a horizontally flipped image does not produce the same 
as the result as applying the same filter to the original 
image. Ghanging to use 64-bit double precision arithmetic 
- CV_64F - produces identical results on all of our test 
images, with no assertion failures. 

5 Conclusion 

We have proposed reflection invariance to be an impor¬ 
tant consideration when designing and implementing al¬ 
gorithms. In citing contemporary research projects, we 
have demonstrated inconsistencies in applications of scene 
classification, object detection and age-guessing when pre¬ 
sented with images and their horizontal reflections. We 
have described where some of the sensitivity is exhibited 
in feature detection and descriptors and applications of 
alignment and localization. We therefore urge researchers 
to consider reflection invariance when designing and im¬ 
plementing algorithms, and suggest reflection consistency 
should be introduced as a measurement of success of algo¬ 
rithms and their improvement over state-of-the-art. 
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